Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples
نویسندگان
چکیده
Many real-world classification applications fall into the class of positive and unlabeled learning problems. The existing techniques almost all are based on the two-step strategy. This paper proposes a new reliable negative extracting algorithm for step 1. We adopt kNN algorithm to rank the similarity of unlabeled examples from the k nearest positive examples, and set a threshold to label some unlabeled examples that lower than it as the reliable negative examples rather than the common method to label positive examples. In step 2, we use iterative SVM technique to refine the finally classifier. Our proposed method is simplicity and efficiency and on some level independent to k. Experiments on the popular Reuter21578 collection show the effectiveness of our proposed technique.
منابع مشابه
A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples
This paper investigates a new approach for training text classifiers when only a small set of positive examples is available together with a large set of unlabeled examples. The key feature of this problem is that there are no negative examples for learning. Recently, a few techniques have been reported are based on building a classifier in two steps. In this paper, we introduce a novel method ...
متن کاملClustering-based Method for Positive and Unlabeled Text Categorization Enhanced by Improved TFIDF
PU learning occurs frequently in Web pages classification and text retrieval applications because users may be interested in information on the same topic. Collecting reliable negative examples is a key step in PU (Positive and Unlabeled) text classification, which solves a key problem in machine learning when no labeled negative examples are available in the training set or negative examples a...
متن کاملPositive Unlabeled Learning for Deceptive Reviews Detection
Deceptive reviews detection has attracted significant attention from both business and research communities. However, due to the difficulty of human labeling needed for supervised learning, the problem remains to be highly challenging. This paper proposed a novel angle to the problem by modeling PU (positive unlabeled) learning. A semi-supervised model, called mixing population and individual p...
متن کاملEnsemble Based Positive Unlabeled Learning for Time Series Classification
Many real-world applications in time series classification fall into the class of positive and unlabeled (PU) learning. Furthermore, in many of these applications, not only are the negative examples absent, the positive examples available for learning can also be rather limited. As such, several PU learning algorithms for time series classification have recently been developed to learn from a s...
متن کاملLearning to Rank Biomedical Documents with only Positive and Unlabeled Examples: A Case Study
In the text mining field, obtaining training data requires human experts' labeling efforts, which is often time consuming and expensive. Supervised learning with only a small number of positive examples and a large amount of unlabeled data, which is easy to get, has attracted booming interests in the field. A recently proposed relabeling method, which assumes unlabeled data as negative data for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCP
دوره 4 شماره
صفحات -
تاریخ انتشار 2009